330 research outputs found
Constrained large solutions to Leray's problem in a distorted strip with the Navier-slip boundary condition
In this paper, we will solve the Leray's problem for the stationary
Navier-Stokes system in a 2D infinite distorted strip with the Navier-slip
boundary condition. The existence, uniqueness, regularity and asymptotic
behavior of the solution will be investigated. Moreover, we discuss how the
friction coefficient affects the well-posedness of the solution. Due to the
validity of the Korn's inequality, all constants in each a priori estimate are
independent of the friction coefficient. The main novelty is the total flux of
the velocity can be relatively large (proportional to the {\it slip length})
when the friction coefficient of the Navier-slip boundary condition is small,
which is essentially different from the 3D case.Comment: 45 pages. arXiv admin note: text overlap with arXiv:2204.10578. A
remark is added to state the independent accomplishment of solving the 2D
Leray's problem with the Navier-slip boundary condition by our group and
Professor Chunjing Xie's grou
Pre-RMSNorm and Pre-CRMSNorm Transformers: Equivalent and Efficient Pre-LN Transformers
Transformers have achieved great success in machine learning applications.
Normalization techniques, such as Layer Normalization (LayerNorm, LN) and Root
Mean Square Normalization (RMSNorm), play a critical role in accelerating and
stabilizing the training of Transformers. While LayerNorm recenters and
rescales input vectors, RMSNorm only rescales the vectors by their RMS value.
Despite being more computationally efficient, RMSNorm may compromise the
representation ability of Transformers. There is currently no consensus
regarding the preferred normalization technique, as some models employ
LayerNorm while others utilize RMSNorm, especially in recent large language
models. It is challenging to convert Transformers with one normalization to the
other type. While there is an ongoing disagreement between the two
normalization types, we propose a solution to unify two mainstream Transformer
architectures, Pre-LN and Pre-RMSNorm Transformers. By removing the inherent
redundant mean information in the main branch of Pre-LN Transformers, we can
reduce LayerNorm to RMSNorm, achieving higher efficiency. We further propose
the Compressed RMSNorm (CRMSNorm) and Pre-CRMSNorm Transformer based on a
lossless compression of the zero-mean vectors. We formally establish the
equivalence of Pre-LN, Pre-RMSNorm, and Pre-CRMSNorm Transformer variants in
both training and inference. It implies that Pre-LN Transformers can be
substituted with Pre-(C)RMSNorm counterparts at almost no cost, offering the
same arithmetic functionality along with free efficiency improvement.
Experiments demonstrate that we can reduce the training and inference time of
Pre-LN Transformers by up to 10%.Comment: 15 pages, 5 tables, code available at
https://github.com/ZixuanJiang/pre-rmsnorm-transforme
The Influencing Path of Public Engaging Intention in the Value Co-Creation of E-Gov Services:An Empirical Investigation
The wide acceptability of ICTs and social media enriches the delivery platform of e-gov services (EGS). EGS is an important interaction and collaboration channel between the government and the public. The public can conveniently and timely explore problems, provide ideas, and design solutions to improve EGS. The roles of the public changed to active, informed partners or co- creators of EGS innovation and problem solving. This study builds the influence factor model on public engaging intention of value co-creation for EGS based on technology acceptance theory, trust theory, and motivation theory to explore impact factors and impact paths. Path analysis interpreted how the public would accept and adopt value co-creation behavior for EGS. This study also introduced a comprehensive picture of the new paradigm of public service value creation in an era of increasing user dominance, that is, the public
Multi-view Inverse Rendering for Large-scale Real-world Indoor Scenes
We present a multi-view inverse rendering method for large-scale real-world
indoor scenes that reconstructs global illumination and physically-reasonable
SVBRDFs. Unlike previous representations, where the global illumination of
large scenes is simplified as multiple environment maps, we propose a compact
representation called Texture-based Lighting (TBL). It consists of 3D meshs and
HDR textures, and efficiently models direct and infinite-bounce indirect
lighting of the entire large scene. Based on TBL, we further propose a hybrid
lighting representation with precomputed irradiance, which significantly
improves the efficiency and alleviate the rendering noise in the material
optimization. To physically disentangle the ambiguity between materials, we
propose a three-stage material optimization strategy based on the priors of
semantic segmentation and room segmentation. Extensive experiments show that
the proposed method outperforms the state-of-the-arts quantitatively and
qualitatively, and enables physically-reasonable mixed-reality applications
such as material editing, editable novel view synthesis and relighting. The
project page is at https://lzleejean.github.io/TexIR.Comment: The project page is at: https://lzleejean.github.io/TexI
Photonic Floquet time crystals
The public and scientists constantly have different perspectives. While on a
time crystal, they stand in line and ask: What is a time crystal? Show me a
material that is spontaneously crystalline in time? This study synthesizes a
photonic material of Floquet time crystals and experimentally observes its
indicative period-2T beating. We explicitly reconstruct a discrete
time-crystalline ground state and reveal using an appropriately-designed
photonic Floquet simulator the rigid period-doubling as a signature of the
spontaneous breakage of the discrete time-translational symmetry. Unlike the
result of the exquisite many-body interaction, the photonic time crystal is
derived from a single-particle topological phase that can be extensively
accessed by many pertinent nonequilibrium and periodically-driven platforms.
Our observation will drive theoretical and technological interests toward
condensed matter physics and topological photonics, and demystify time crystals
for the non-scientific public.Comment: 39 pages, 5 figures, supplementary materials, 6 suppl. figure
An Image Dataset for Benchmarking Recommender Systems with Raw Pixels
Recommender systems (RS) have achieved significant success by leveraging
explicit identification (ID) features. However, the full potential of content
features, especially the pure image pixel features, remains relatively
unexplored. The limited availability of large, diverse, and content-driven
image recommendation datasets has hindered the use of raw images as item
representations. In this regard, we present PixelRec, a massive image-centric
recommendation dataset that includes approximately 200 million user-image
interactions, 30 million users, and 400,000 high-quality cover images. By
providing direct access to raw image pixels, PixelRec enables recommendation
models to learn item representation directly from them. To demonstrate its
utility, we begin by presenting the results of several classical pure ID-based
baseline models, termed IDNet, trained on PixelRec. Then, to show the
effectiveness of the dataset's image features, we substitute the itemID
embeddings (from IDNet) with a powerful vision encoder that represents items
using their raw image pixels. This new model is dubbed PixelNet.Our findings
indicate that even in standard, non-cold start recommendation settings where
IDNet is recognized as highly effective, PixelNet can already perform equally
well or even better than IDNet. Moreover, PixelNet has several other notable
advantages over IDNet, such as being more effective in cold-start and
cross-domain recommendation scenarios. These results underscore the importance
of visual features in PixelRec. We believe that PixelRec can serve as a
critical resource and testing ground for research on recommendation models that
emphasize image pixel content. The dataset, code, and leaderboard will be
available at https://github.com/westlake-repl/PixelRec
Voxurf: Voxel-based Efficient and Accurate Neural Surface Reconstruction
Neural surface reconstruction aims to reconstruct accurate 3D surfaces based
on multi-view images. Previous methods based on neural volume rendering mostly
train a fully implicit model with MLPs, which typically require hours of
training for a single scene. Recent efforts explore the explicit volumetric
representation to accelerate the optimization via memorizing significant
information with learnable voxel grids. However, existing voxel-based methods
often struggle in reconstructing fine-grained geometry, even when combined with
an SDF-based volume rendering scheme. We reveal that this is because 1) the
voxel grids tend to break the color-geometry dependency that facilitates
fine-geometry learning, and 2) the under-constrained voxel grids lack spatial
coherence and are vulnerable to local minima. In this work, we present Voxurf,
a voxel-based surface reconstruction approach that is both efficient and
accurate. Voxurf addresses the aforementioned issues via several key designs,
including 1) a two-stage training procedure that attains a coherent coarse
shape and recovers fine details successively, 2) a dual color network that
maintains color-geometry dependency, and 3) a hierarchical geometry feature to
encourage information propagation across voxels. Extensive experiments show
that Voxurf achieves high efficiency and high quality at the same time. On the
DTU benchmark, Voxurf achieves higher reconstruction quality with a 20x
training speedup compared to previous fully implicit methods
ShareGPT4V: Improving Large Multi-Modal Models with Better Captions
In the realm of large multi-modal models (LMMs), efficient modality alignment
is crucial yet often constrained by the scarcity of high-quality image-text
data. To address this bottleneck, we introduce the ShareGPT4V dataset, a
pioneering large-scale resource featuring 1.2 million highly descriptive
captions, which surpasses existing datasets in diversity and information
content, covering world knowledge, object properties, spatial relationships,
and aesthetic evaluations. Specifically, ShareGPT4V originates from a curated
100K high-quality captions collected from advanced GPT4-Vision and has been
expanded to 1.2M with a superb caption model trained on this subset. ShareGPT4V
first demonstrates its effectiveness for the Supervised Fine-Tuning (SFT)
phase, by substituting an equivalent quantity of detailed captions in existing
SFT datasets with a subset of our high-quality captions, significantly
enhancing the LMMs like LLaVA-7B, LLaVA-1.5-13B, and Qwen-VL-Chat-7B on the MME
and MMBench benchmarks, with respective gains of 222.8/22.0/22.3 and
2.7/1.3/1.5. We further incorporate ShareGPT4V data into both the pre-training
and SFT phases, obtaining ShareGPT4V-7B, a superior LMM based on a simple
architecture that has remarkable performance across a majority of the
multi-modal benchmarks. This project is available at
https://ShareGPT4V.github.io to serve as a pivotal resource for advancing the
LMMs community.Comment: Project: https://ShareGPT4V.github.i
A compact butterfly-style silicon photonic-electronic neural chip for hardware-efficient deep learning
The optical neural network (ONN) is a promising hardware platform for
next-generation neurocomputing due to its high parallelism, low latency, and
low energy consumption. Previous ONN architectures are mainly designed for
general matrix multiplication (GEMM), leading to unnecessarily large area cost
and high control complexity. Here, we move beyond classical GEMM-based ONNs and
propose an optical subspace neural network (OSNN) architecture, which trades
the universality of weight representation for lower optical component usage,
area cost, and energy consumption. We devise a butterfly-style
photonic-electronic neural chip to implement our OSNN with up to 7x fewer
trainable optical components compared to GEMM-based ONNs. Additionally, a
hardware-aware training framework is provided to minimize the required device
programming precision, lessen the chip area, and boost the noise robustness. We
experimentally demonstrate the utility of our neural chip in practical image
recognition tasks, showing that a measured accuracy of 94.16% can be achieved
in hand-written digit recognition tasks with 3-bit weight programming
precision.Comment: 17 pages,5 figure
- …